Alignment of single-cell trajectory trees with CAPITAL

Global alignment of complex pseudotime trajectories between different single-cell RNA-seq datasets is challenging, as existing tools mainly focus on linear alignment of single-cell trajectories. Here we present CAPITAL (comparative analysis of pseudotime trajectory inference with tree alignment), a method for comparing single-cell trajectories with tree alignment whereby branching trajectories can be automatically compared. Computational tests on synthetic datasets and authentic bone marrow cells datasets indicate that CAPITAL has achieved accurate and robust alignments of trajectory trees, revealing various gene expression dynamics including gene–gene correlation conservation between different species.


Supplementary Notes
Supplementary Note 1 Proposition 1. CAPITAL runs faster than all-against-all alignment of single lineages if input trajectories have one or more branches.
Proof. For simplicity, we assume that the clusters, the leaves and the cells in one trajectory tree have the same in number as in the other, denoted by N , L and n, respectively.
CAPITAL first computes a cluster tree alignment in O(N 2 ) time. Second, diffusion pseudotime of O(n) single cells is calculated per branch with solving the eigenvalue problem [1] In contrast, the all-against-all alignment of single lineages requires O(L 2 ) comparisons, each of which takes O(n 3 ) + O(n 2 ) steps as discussed above. Hence, it runs in O(L 2 n 3 ) time.
Therefore, CAPITAL can run faster than the naive all-against-all linear alignment by a factor of O(L).

Supplementary Note 2
Assume that the nodes in tree T k (k = 1, 2) are numbered by 1 through |T k | in the postorder fashion. The following pseudocode computes an optimal tree alignment distance between the unordered trees with bounded degrees: Initialize with D(θ, θ) = 0.

3:
) obtained in the above pseudocode will have an optimal alignment distance between T 1 and T 2 . To recover the optimal tree alignment, the traceback procedure starting with is recovered with traceback pointers that hold the choice of the minimum operations in the dynamic programming recursions.  Fig. 6a in the main text. Supplementary Fig. 12 Pseudotime aligned kinetics for genes with similar patterns along the paths obtained from the aligned trajectories in the human-mouse bone marrow cell datasets. These are a set of genes that show an increasing tendency of expression both in the human and mouse datasets. alignment001, (0/HSC, 7/HSC)→(11/Mono, 19/Mono); alignment003, (0/HSC, 7/HSC)→(3/Neutro, 6/Neutro); alignment008, (0/HSC, 7/HSC)→(9/Ery, 5/Ery); HSC, hematopoietic stem cell; Ery, erythrocyte; Mono, monocyte; Neutro, neutrophil. Supplementary Fig. 13 Pseudotime aligned kinetics for genes with different patterns along the paths obtained from the aligned trajectories in the human-mouse bone marrow cell datasets. a, These are a set of genes that show an increasing tendency of expression in the human dataset and an decreasing tendency in the mouse dataset, and b, vice versa. Each linear alignment is the same as Supplementary Fig. 12. terns along pseudotime between human and mouse bone marrow cells. They are colored by p-values computed by Metascape [5], where the one-sided statistical tests based on the hypergeometric distribution were performed. The terms for Ery, Mono and Neutro are derived from the genes that show an increasing tendency of expression for the respective alignment paths both in the human and mouse datasets. a, Human enriched clusters. b, Mouse enriched clusters. Ery, erythrocyte; Mono, monocyte; Neutro, neutrophil. Fig. 15 A schematic of the dynamic programming (DP) recursion for aligning trees shown in Eq. (7) in the main text. For simplicity, an example of computing a distance between only binary trees is shown. a, The case where nodes i ∈ V (T 1 ) and j ∈ V (T 2 ) are matched. A black dashed curve connects matching nodes across the trees. b, The case where node i ∈ V (T 1 ) has no matching node in tree T 2 (j). c, The case where node j ∈ V (T 2 ) has no matching node in tree T 1 (i). Fig. 16 A schematic of the DP recursion for aligning forests shown in Eq. (8) in the main text. For simplicity, an example of computing a distance between forests A = {T 1 (i 1 ), T 1 (i 2 )} and B = {T 2 (j 1 ), T 2 (j 2 )}, i.e. consisting of only binary trees, is shown. a, The case where all forests and trees across two inputs are matched. b, The case where B ′ = {T 2 (j q )} (e.g. q = 2), and node i p ∈ V (T 1 ) (e.g. p = 2) has no matching node in forest B. c, The case where A ′ = {T 1 (i p )} (e.g. p = 2), and node j q ∈ V (T 2 ) (e.g. q = 2) has no matching node in forest A.